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We-consider  the  effects  of  parallelizing  branch-aqKhbound  algorithms  by  expand¬ 
ing  several  live  nodes" simultaneously.  It  is  shown'  that  it.  is  quite  possible  for  a 
parallel  branch-and-bound  algorithm  using  nj>  processors  to  ta^ce  more  time 
than  one  using  ni  processors  even  though  n%  <  n$.  Furtherinorey  it  is  also  possi¬ 
ble  to  achieve  speedups  that  are  in  excess  of  the  ratio  n^/ni.  Experimental 
results  with  the  0/1  Knapsack  and  Traveling  Salesperson  problems  are  also 
presented. 
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1.  Introduction 

Branch-and-bound  is  a  popular  algorithm  design  technique  that  has  been  suc¬ 
cessfully  used  in  the  solution  of  problems  that  arise  in  various  fields  (e.g.,  com-, 
binatorial  optimization,  artificial  intelligence,  etc,)-fl;-fr-m  We  shat^briefly 
describe  the  branch-and-bound  method  as  used  in  the  solution  of  combinatorial 
optimization  problems.  \  Our  terminology  is  from  Horowiz  and  Sahni  [7]. _ _ 

In  a  combinatorial  optimization  problem  we  are  required  to  find  a  vector  x 
=  (zj,  *2,  ....  zB)  that  optimizes  some  criterion  function  f(x)  subject  to  a  set  C  of 
constraints.  This  constraint  set  may  be  portioned  into  two  subsets:  explicit  and 
implicit  constraints.  Implicit  constraints  specify  how  the  z^s  must  relate  to 
each  other.  Two  examples  are: 

1)  JojZj^b 

4*1 

2)  diz?  -  a2Z,z2  +  as x$  =  8 

Explicit  constraints  specify  the  range  of  values  each  a*  can  take.  For  exam- 


1)  *i  *  {0.  lj 

2)  z(  i  0 

The  set  of  vectors  that  satisfy  the  explicit  constraints  defines  the  solution 
space.  In  a  branch-and-bound  approach  this  solution  space  is  organized  as  a 
graph  which  is  usually  a  tree.  This  resulting  organization  is  called  a  state  space 
graph  (tree).  All  the  state  space  graphs  used  in  this  paper  are  trees.  So  we  shall 
henceforth  only  refer  to  state  space  trees.  Figure  1  shows  a  state  space  tree  for 
the  case  n  =  3  and  Zf  e  (0,  1).  The  path  from  the  root  to  some  of  the  nodes  (in 
this  case  the  leaves)  defines  an  element  of  the  solution  space.  Nodes  with  this 
property  are  called  solution  nodes.  Solution  nodes  that  satisfy  the  implicit  con¬ 
straints  are  called  feasible  solution  nodes  or  answer  nodes.  Answer  nodes  have 
been  drawn  as  double  circles  in  Figure  1.  The  cost  of  an  answer  node  is  the  value 
of  the  criterion  function  at  that  node.  In  solving  a  combinatorial  optimization 
problem  we  wish  to  find  a  least  cost  answer  node. 


j,y,i 
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Figure  1  A  state  space  tree 


For  convenience  we  assume  that  we  wish  to  minimize  f(x).  With  every  node 
N  in  the  state  space  tree,  we  associate  a  value  /min(N)  =  minJf(Q)  :  Q  is  a  feasible 
solution  node  in  the  subtree  N{.  (If  there  exists  no  such  Q,  then  let  / mm(^)  = 
“•) 


While  there  are  several  types  of  branch-and*bound  algorithms,  we  shall  be 
concerned  only  with  the  more  popular  least  cost  branch-and-bound  (lcbb).  In 
this  method  a  heuristic  function  g(  )  with  the  following  properties  is  used: 

(Pi)  g(N)  s  /  mm  (N)  for  every  node  N  in  the  state  space  tree. 

(P2)g(N)  =  f(N)  for  solution  nodes  representing  feasible  solutions  (i.e.,  answer 
nodes). 
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(P3)  g(N)  =  «  for  solution  nodes  representing  infeasible  solutions. 

(P4)  g(N)  >  g(P)  if  N  is  a  child  of  P. 

g(  )  is  called  a  bounding  function,  lcbb  generates  the  nodes  in  a  state  space 
tree  using  g(  ).  A  node  that  has  been  genereated,  can  lead  to  a  feasible  solution, 
and  whose  children  haven't  yet  been  generated  is  called  a  two  node.  A  list  of  live 
nodes  (generally  as  a  heap  )  is  maintained.  In  each  iteration  of  the  lcbb  a  live 
node,  N,  with  least  g( )  value  is  selected.  This  node  is  called  the  current  E-noda. 
If  N  is  an  answer  node,  it  must  be  a  least  cost  answer  node.  If  N  is  not  an  answer 
node,  its  children  are  generated.  Children  that  cannot  lead  to  a  least  cost 
answer  node  (as  determined  by  some  heuristic)  are  discarded.  The  remaining 
children  are  added  to  the  list  of  live  nodes. 

The  problem  of  parallelizing  lcbb  has  been  studied  earlier  [2  -  5,  13].  There 
are  essentially  three  ways  to  introduce  parallelism  into  lcbb: 

(1)  Expand  more  than  1  E-node  during  each  iteration. 

(2)  Evaluate  g(  )  and  determine  feasibility  in  parallel. 

(3)  Use  parallelism  in  the  selection  of  the  next  E-node(s). 

Wah  and  Ma  [13]  exclusively  consider  (l)  above  (though  they  point  out  (2) 
and  (3)  as  possible  sources  of  parallelism).  If  p  processors  are  available  then  q 
=  minjp,  number  of  live  nodes]  live  nodes  are  selected  as  the  next  set  of  E-nodes 
(these  are  the  q  live  nodes  with  smallest  g(  )  values).  Let  9  min  be  the  least  g 
value  among  these  q  nodes.  If  any  of  these  E-nodes  is  an  answer  node  and  has  g( 
)  value  equal  to  9^  then  a  least  cost  answer  node  has  been  found.  Otherwise  all 
q  E-nodes  are  expanded  and  their  children  added  to  the  list  of  live  nodes.  Each 
such  expansion  of  q  E-node  counts  as  one  iteration  of  the  parallel  lcbb.  For  any 
given  problem  instance  and  g,  let  I(p)  denote  the  number  of  iterations  needed 
when  p  processors  are  available.  Intuition  suggests  that  the  following  might  be 
true  about  I(p): 

(II)  !(«,)  at  I(n8)  whenever  n,  <  n8 
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In  Section  2,  we  show  that  neither  of  these  two  relations  is  in  fact  valid. 
Even  if  the  g(  )s  are  restricted  beyond  (PI)  -  (P4),  these  relations  do  not  hold. 
The  experimental  results  provided  in  Section  3  do.  however,  show  that  (II)  and 
(12)  can  be  expected  to  hold  "most"  of  the  time. 

Wah  and  Ma  [13]  experimented  with  the  vertex  cover  problem  using  2*.  Os 
k  <  6  processor.  Their  results  indicate  that  1(1) /I(p)  at  p.  Our  experiments  with 
the  0/ 1-Knapsack  and  Traveling  Salesperson  problems  indicate  that  I(l)/I(p)  =*  p 
only  for  "small”  values  of  p  (say  p  <  16  ). 

2.  Some  Theorems  For  Parallel  Ekanch-and-Bound 

As  remarked  in  the  introduction,  several  anomalies  occur  when  one  parallelizes 
branch-and-bound  algorithms  by  using  several  E-nodes  at  each  iteration.  In  this 
section  we  establish  these  anomalies  under  varying  constraints  for  the  bounding 
function  g(  ).  First,  it  should  be  recalled  that  the  g(  )  functions  typically  used 
(eg.  for  the  knapsack  problem,  traveling  salesperson  problem,  etc.  cf.  [7]  )  have 
the  following  properties: 

(a)  g(N)  >  g(M)  whenever  N  is  a  child  of  node  M.  Thus,  the  g( )  values  along  any 
path  from  the  root  to  a  leaf  form  a  nondecreasing  sequence. 

(b)  Several  nodes  in  the  state  space  tree  may  have  the  same  g(  )  value.  In  fact, 
many  nonsolution  nodes  may  have  a  g(  )  value  equal  to  f*.  This  is  particu¬ 
larly  true  of  nodes  that  are  near  ancestors  of  solution  nodes. 

In  constructing  example  state  space  trees,  we  shall  keep  (a)  in  mind.  None 
of  the  trees  constructed  will  violate  (a)  and  we  shall  not  explictly  make  this 
point  in  further  discussion.  The  first  result  we  shall  establish  is  that  it  is  quite 
possible  for  a  parallel  branch-and-bound  using  ne  processors  to  perform  much 
worse  than  one  using  a  fewer  number  nt  of  processors. 

Theorem  1:  Let  n,  <  nz.  For  any  k  >  0,  there  exists  a  problem  instance  such 
that  kl(nj)  <  I(n8). 

Proof:  Consider  a  problem  instance  with  the  state  space  tree  of  Figure  2.  All 
nonleaf  nodes  have  the  same  g(  )  value  equal  to  f*.  the  f  value  of  the  least  cost 
answer  node  (node  A).  Whennj  processors  are  available,  one  processor  expands 
the  root  and  generates  its  nj  +  1  children.  Let  us  suppose  that  on  iteration  2, 
the  left  nt  nodes  on  level  2  get  expanded.  Of  the  nt  children  generated  n,  -  1 


get  bounded  and  only  one  remains  live.  On  iteration  3  the  remaining  live  node 
on  level  2  (B)  and  the  one  on  level  3  are  expanded.  The  level  3  node  Leads  to  the 
solution  node  and  the  algorithm  terminates  with  I(n,)  =  3. 


level 

1 


O  0-"0 
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3k-l 

levels 


Figure  2:  Instance  for  Theorem  1 


When«z  processors  are  available,  the  root  is  expanded  on  iteration  1  and  all 
«i  +  1  live  nodes  from  level  2  get  expanded  on  iteration  2.  The  result  is  nz  +  1 
live  nodes  on  level  3.  Of  these,  only  nz  can  be  expanded  on  iteration  3.  These  nz 
could  well  be  the  rightmost  nz  nodes.  And  iterations  4,  5,  ....  3k  could  very  well 
be  limited  to  the  rightmost  subtree  of  the  root.  Finally  in  iteration  3k  +  1,  the 
least  cost  answer  node  a  is  generated.  Hence,  I(nz)  =  3k  +  1  and  kl(rti)  <  I(nz). 
[] 


In  the  above  construction,  all  nodes  have  the  same  g(  )  value,  f*.  While  this 
might  seem  extreme,  property  (b)  above  states  that  it  is  not  unusual  for  real  g- 
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functions  to  have  a  value  f*  at  many  nodes.  The  example  of  Figure  2  does  serve 
to  illustrate  why  the  use  of  additional  processors  may  not  always  be  rewarding. 
The  use  of  an  additional  processor  can  lead  to  the  development  of  a  node  N 
(such  as  node  B  of  Figure  2)  that  looks  "promising"  and  eventually  diverts  all  or 
a  significant  number  of  the  processors  into  its  subtree.  When  a  fewer  number  of 
processors  are  used,  the  upper  bound  U  at  the  time  this  "promising"  node  is  to 
get  expanded  might  be  such  that  U  s  g(N)  and  so  N  is  not  expanded  when  a 
fewer  number  of  processors  are  available. 

The  proof  of  Theorem  1  hinges  on  the  fact  that  g(N)  may  equal  f*  for  many 
nodes  (independent  of  whether  these  nodes  are  least  cost  answer  nodes  or  not). 
If  we  require  the  use  of  g-functions  that  can  have  the  value  f*  only  for  least  cost 
answer  nodes,  then  Theorem  1  is  no  longer  valid  for  all  combinations  of  nt  and 
ng,  n,  <  nz  In  particular,  if  nj  =  1  then  the  use  of  more  processors  never 
increases  the  number  of  iterations  (Theorem  2). 

Definition:  A  node  N  is  critical  iff  g(N)  <  f*. 

Theorem  2:  If  g(N)  *  f*  whenever  N  is  not  a  least  cost  answer  node,  then  1(1)  & 
I(n)  for  n  >  1 . 

Proof:  When  the  number  of  processors  is  1,  only  critical  nodes  and  least  cost 
answer  nodes  can  become  E-nodes  (as  whenever  an  E-node  is  to  be  selected 
there  is  at  least  one  node  N  with  g(N)  <  f  *  in  the  list  of  live  nodes).  Furthermore, 
every  critical  node  becomes  an  E-node  by  the  time  the  branch-and-bound  algo¬ 
rithm  terminates.  Hence,  if  the  number  of  critical  nodes  is  m,  1(1)  =  m. 

When  n  >  1  processors  are  available,  some  noncritical  nodes  may  become 
E-nodes.  However,  at  each  iteration,  at  least  one  of  the  E-nodes  must  be  a  criti¬ 
cal  node.  So,  I(n)«m.  Hence,  1(1)  &  I(n).  [] 

When  n,  *  1,  a  degradation  in  performance  is  possible  with  n2  >  n,  even  if 
we  restrict  the  g(  )s  as  in  Theorem  2. 

Theorem  3:  Assume  that  g(N)  *  f*  whenever  N  is  not  a  least  cost  answer  node. 
Let  1  <  n,  <  ng  and  k  >  0.  There  exists  a  problem  instance  such  that  I(ni)  +  k  « 
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I  (n2). 

Proof:  Figures  3(a)  and  3(b)  show  two  identical  subtrees  T.  Assume  that  ail 
nodes  have  the  same  g(  )  value  and  are  critical.  The  numbers  inside  each  node 
give  the  iteration  number  in  which  that  node  becomes  an  E-node  when  n,  pro¬ 
cessors  are  used  (Figure  3(a))  and  when  nz  processors  are  used  (Figure  3(b)). 
Other  evaluation  orders  are  possible.  However,  the  ones  shown  in  Figures  3(a) 
and  3(b)  will  lead  to  a  proof  of  this  theorem. 

We  can  construct  a  larger  state  space  tree  by  connecting  together  k  copies 
of  T  (Figure  3(c)).  The  B  node  of  one  copy  connects  to  the  A  node  (root)  of  the 
next.  Each  triangle  in  this  figure  represents  a  copy  of  T.  The  least  cost  answer 
node  is  the  child  of  the  B  node  of  the  last  copy  of  T.  It  is  clear  that  for  the  state 
space  tree  of  Figure  3(c),  I(ni)  =  jk  while  I(nz)  =  (j  +  l)k.  Hence.  I(n,)  +  k  = 
I(nz)-  [] 

The  assumption  that  g(N)  *  f*  when  N  is  not  a  least  cost  answer  node  is  not 
too  unrealistic  as  it  is  often  possible  to  modify  typical  g(  )s  so  that  they  satisfy 
this  requirement.  The  example  of  Figure  3  has  many  nodes  with  the  same  g(  ) 
value  and  so  we  might  wonder  what  would  happen  if  we  restricted  the  g(  )s  so 
that  only  least  cost  answer  nodes  can  have  the  same  g(  )  value.  This  restriction 
on  g(  )  is  quite  severe  and,  in  practice,  it  is  often  not  possible  to  guarantee  that 
the  g(  )  in  use  satisfies  this  restriction.  However,  despite  the  severity  of  the  res¬ 
triction  one  cannot  guarantee  that  there  will  be  no  degradation  of  performance 
using  tl2  processors  when  <  n2  <  2(nj  -  l).  We  have  unfortunately  been 
unable  to  extend  our  result  of  Theorem  4  to  the  case  when  n2  5t  2(n,  -  1).  So,  it 
is  quite  possible  that  no  degradation  is  possible  when  the  number  of  processors 
is  (approximately)  doubled  and  g(  )  is  restricted  as  above. 

Theorem  4:  Let  nj  <  nz  <  2(nt  -  1)  and  let  k  >  0.  There  exists  a  g(  )  and  a  prob¬ 
lem  instance  that  satisfy  the  following  properties: 

(a)  g(N i)  *  g (Ng)  unless  both  of  Nt  and  Ng  are  least  cost  answer  nodes. 

(b)  I(«,) +  k«I(nz). 

Proof:  Consider  the  state  space  tree  of  Figure  4(a).  The  number  outside  each 
node  is  its  g(  )  value  while  the  numiver  Inside  a  node  gives  the  iteration  in  which 


M 


Figure  3i  Instance  for  Theorem  3 


that  node  is  the  E-node  when  ri|  processors  are  used.  It  takes  «,  processors  4 
iterations  to  get  to  and  evaluate  node  B.  When  ns  processors  are  available,  nt  < 
ng  <  2(n,  -  1),  the  iteration  numbers  are  as  given  in  Figure  4(b).  This  time  5 
iterations  are  needed.  Combining  k  copies  of  this  tree  and  setting  the  g(  ) 


m 
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The  remaining  results  we  shall  establish  in  LL*a  section  are  concerned  with 
the  maximum  improvement  in  performance  one  can  get  in  going  from  n,  tori; 
processors,  nt  <  rig.  Generally,  one  would  expect  that  the  performance  can 
increase  by  at  most  nz  /  nl.  This  is  not  true  for  branch-and-bound.  In  fact, 
Theorem  5  shows  that  using  g(  )s  that  satisfy  properties  (a)  and  (b).  an 
unbounded  improvement  in  performance  is  possible.  The  reason  for  this  is 
much  the  same  as  for  the  possibility  of  an  unbounded  loss  in  performance.  The 
additional  processors  might  enable  us  to  improve  the  upper  bound  quickly 
thereby  curtailing  the  expansion  of  some  of  the  nodes  that  might  get  expanded 
without  these  processors. 

Theorem  5:  Let  n,  <  nz,  For  any  k  >  nz/  n,,  there  exists  a  problem  instance  for 
which  7(n,)//(nz)  2*  k  >  tij/nj. 

Proof:  Simply  consider  the  state  space  tree  of  Figure  5(a).  All  nodes  have  the 
same  g(  )  value,  f*  Assume  that  when  n,  processors  are  used,  the  n,  nodes  at 
the  left  end  of  level  i  become  E-nodes  on  iteration  i,  2  <  i  ^  2k.  Hence.  I(n,)  = 
2k  +  1.  When  n2  >  n,  processors  are  used,  I(nz)  =  2  (Figure  5(b))  and 
I(n,)/I(nz)  >  k.  [] 

As  in  the  case  of  Theorem  2,  we  can  show  that  when  g(N)  ?  f*  whenever  N  is 
not  a  least  cost  answer  node.  I(  l)/l(n)  ^  n. 

Theorem  6:  Assume  that  g(N)  *  f*  whenever  N  is  not  a  least  cost  answer  node. 
l(l)/I(n)  <  n  for  n  >  1 . 

Proof:  From  the  proof  of  Theorem  2,  we  know  that  1(1)  =  m  where  m  is  the 
number  of  critical  nodes.  Since  all  critical  nodes  must  become  E-nodes  before 
the  branch-and-bound  algorithm  can  terminate,  l(n)  >  m/n.  Hence,  l(l)/l(n)  « 
n.  [] 

When  1  <  n,  <  nz  and  g(N)  is  restricted  as  above,  l(n,)/l(nz)  can  exceed 
nz/nx  but  cannot  exceed  nz. 

Theorem  7:  Assume  that  g(N)  *  f*  whenever  N  is  not  a  least  cost  answer  node. 
Let  1  <  n,  <  n2.  The  following  are  true: 


Figure  5.  Instance  for  Theorem  5 


(1)  I(n,)/I(nz)<;n2. 

(2)  There  exists  a  problem  instance  for  which  l(rii)/I(n2)  >  nz/nx. 


Proof:  (1)  From  Theorems  2  and  6.  we  immediately  obtain: 


=  £<giL  mi 

/(««)  /(l)  /("«)  8 


(2)  For  simplicity,  we  assume  that  k  =  n2  /  n,  is  an  integer.  Consider  the 
state  space  tree  of  Figure  8.  The  g(  )  value  of  all  nodes  other  than  the  one 
representing  the  least  cost  answer  is  less  than  f*.  The  number  inside  (outside)  a 
node  is  the  iteration  in  which  it  is  the  E-node  when  ni  (n2)  processors  are  used. 
We  see  that  l(nj)  =  n2(k  +  1)  +  1  and  I(n2)  =  n2  +  2.  Hence, 


;  ’.t 


3.  Experimental  Hesults 

In  order  to  determine  the  frequency  of  anomalous  behavior  described  in  the 
previous  section,  we  simulated  a  parallel  branch-and-bound  with  2*  processors 

for  k  =  0,  1,  2 . 9.  Two  test  problems  were  used:  0/1-Khapsack  and  Traveling 

Salesperson.  These  are  described  below. 

0/ 1-Knapsack: 

In  this  problem  we  are  given  n  objects  and  a  knapsack  with  capacity  M.  Object  i 
has  associated  with  it  a  profit  ft  and  a  weight  We  wish  to  place  a  subset  of 
the  n  objects  into  the  knapsack  such  that  the  knapsack  capacity  is  not  exceeded 
and  the  sum  of  the  proAts  of  the  objects  in  the  knapsack  is  maximum.  Formally, 
we  wish  to  solve  the  following  problem: 

maximize 

*«i 

subject  to  «  M.  ft  c  JO,  1{. 

i*l 

Horowitz  and  Sahni  [7]  describe  two  state  space  trees  that  could  be  used  to 
solve  this  problem.  One  results  from  what  they  call  the  Axed  tuple  size  formula¬ 
tion.  This  is  a  binary  tree  such  as  the  one  shown  in  Figure  7(a)  for  the  case  n  = 
3.  The  other  results  from  the  variable  tuple  size  formulation  This  is  an  n-ary 
tree.  When  n  =  3.  the  resulting  tree  is  as  in  Figure  7(b).  The  bounding  function 
used  is  the  same  as  the  one  described  in  [7].  Since  the  bounding  function 
require*  that  objects  be  ordered  such  that  ft  /  >  ft+1  /  ift*,.  1  <  i  <  n,  we 

generated  our  test  data  by  first  generating  random  tfts.  The  fts  were  then  com¬ 
puted  from  the  ifts  by  using  a  random  nonincreasing  sequence  / 1,  /2 . /„  and 

the  equation  ft  =  /*ift.  We  generated  100  instances  with  n  =  50  and  60 
instances  with  n  =  100.  These  160  instances  were  solved  using  the  binary  state 
space  tree  described  above.  (We  also  tried  the  n-ary  state  space  tree  but  found 
that  it  would  take  several  weeks  of  computer  time  to  complete  our  simulation. 
The  reason  it  will  take  so  much  time  is  that  when  n-ary  state  space  trees  are 
used  a  great  number  of  nodes  will  be  generated  and  the  queue  of  live  nodes  will 
exceed  the  capacity  of  main  memory  and  has  to  be  moved  to  the  secondary 
storage.  In  our  program,  it  is  time  consumming  to  maintain  a  queue  of  live 
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Figvrt  7 

nodes  that  must  be  partly  stored  in  secondary  storage.) 

Table  1  gives  the  average  values  for  I(p),  I(l)/I(p)  and  I(p)/l(2p).  Prom 
Table  1,  we  see  that  when  n  =  50, 1(1) /l(p)  is  significantly  less  than  p  for  p  >  2. 
The  observed  improvement  in  performance  is  not  as  high  as  one  might  expect. 
Similarly,  the  ratio  I(p)/J(2p)  drops  rapidly  to  1  and  is  acceptable  only  for  p  =  1 
and  2  (see  also  Figure  8).  In  none  of  the  100  instances  tried  for  n  =  50  did  we 
observe  anomalous  behavior.  I.e.,  it  was  never  the  case  that  l(p)  <  l(2p)  or  that 
I(p)  >  21  (2p) 

When  n  =  100,  the  ratio  I(l)/!(p)  is  significantly  less  than  p  for  p  >  8  (see 
also  Figure  9).  Of  the  60  instances  run,  6  (or  10%)  exhibited  anomalous  behavior. 
For  all  6  of  these  there  was  at  least  one  p  for  which  !(p)  >  21(2p).  There  was  only 
one  case  where  I(p)  <  I(2p).  The  values  of  I(p),  I(l)/I(p),  and  I(p)/I(2p)  for  these 
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1.00 

2.19 
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1.87 

1.68 

1351 

2.19 

1.85 

106 

3.17 

1.42 

754 

3.69 

1.75 

8 

70 

4.66 

1.22 

402 

6.47 

1.60 

16 

56 

5.97 

1.09 

232 

10.58 

1.35 

32 

51 

6.84 

1.03 

162 

14.94 

1.22 

64 

50 

7.23 

126 

19.35 

1.14 

128 

50 

7.26 

1.00 

108 

23.68 

1.05 

256 

50 

7.26 

1.00 

102 

25.84 

1.02 

512 

50 

7.26 

1.00 

100 

27.06 

1.01 

1024 

50 

7-26 

27.68 

Table  1:  Experimental  results  (knapsack) 

six  instances  is  given  in  Table  2.  It  is  striking  to  note  the  instance  for  which 
1(1)/1(2)  =  14.8  and  I(2)/I(4)  =  0.15. 

The  Traveling  Salesperson  Problem: 

Here  we  are  given  an  n  vertex  undirected  complete  graph.  Each  edge  is 
assigned  a  weight.  A  tour  is  a  cycle  that  includes  every  vertex  (i.e.,  it  is  a  Hamil¬ 
tonian  cycle).  The  cost  of  a  tour  is  the  sum  of  the  weights  of  the  edges  on  the 
tour.  We  wish  to  find  a  tour  of  minimum  cost. 

The  branch-and-bound  strategy  that  we  used  is  a  simplified  version  of  the 
one  proposed  by  Held  and  Karp  [6].  Vertex  1  is  chosen  as  the  start  vertex. 
There  are  n  - 1  possibilities  for  the  next  vertex  and  n  -  2  for  the  preceding  vertex 
(assume  n  >  2).  This  leads  to  (n  -  l)(n  -  2)  sequences  of  3  vertices  each.  Half  of 
these  may  be  discarded  as  they  are  symmetric  to  other  sequences.  Any 
sequence  with  an  edge  having  infinite  weight  may  also  be  discarded.  Paths  are 
expanded  one  vertex  at  a  time  using  the  set  of  vertices  adjacent  to  the  end  of 

the  path.  A  lower  bound  for  the  path  (i|.  4g . i*)  is  obtained  by  computing  the 

cost  of  the  minimum  spanning  tree  for  }1,  2 . nj  •  *e . 4i  and  adding  an 

edge  from  each  of  <i,  and  1*  to  this  spanning  tree  in  such  a  way  that  these  edges 
connect  to  the  two  nearest  vertices  in  the  spanning  tree. 
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Figure  8 i  Knapsack  with  50  objects 


In  our  experiment  with  the  traveling  salesperson  problem  we  generated  46 
instances  each  having  20  vertices.  The  weights  were  assigned  randomly.  How¬ 
ever,  each  edge  had  a  finite  weight  with  probability  0.35.  Use  of  a  much  higher 
probability  results  in  instances  that  take  years  of  computer  time  to  solve  by  the 


branch-and-bound  method. 


Table  2:  Data  exhibiting  anomalous  behavior 


Those  45  instances  were  solved  using  p  =  2*.  0  <  k  £  9  processors.  The 
average  values  of  I(p),  I(l)/I(p),  and  l(p)/l(2p)  are  tabulated  in  Table  3.  As  can 
be  seen,  for  p  <  32  the  average  value  of  I(l)/l(p)  is  quite  close  to  p  and  the  aver¬ 
age  value  of  l(p)/I(2p)  is  quite  close  to  2  (see  also  Figure  10).  No  anomalies  were 
observed  for  any  of  these  45  instances. 
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Up) 

/(l) 

Up) 

/(p) 

l(*n\ 

1 

3974 

1.000 

1.996 

2 

1989 

1.996 

1.990 

4 

996 

3.973 

1.976 

8 

500 

7.849 

1.943 

16 

252 

15.258 

1.873 

32 

129 

28.685 

1.753 

64 

68 

51.126 

1.609 

128 

39 

85.378 

1.417 

256 

25 

129.411 

1.252 

512 

19 

177.459 

Tbbie  3:  Experimental  results  (traveling  salesperson) 

4.  Conclusions 

We  have  demonstrated  the  existence  of  anomalous  behavior  in  parallel  branch- 
and-bound.  Our  experimental  results  indicate  that  such  anomalous  behavior  will 
be  rarely  witnessed  in  practice.  Furthermore,  there  is  little  advantage  to 
expanding  more  than  k  nodes  in  parallel,  k  will  in  general  depend  on  both  the 
problem  and  the  problem  size  being  solved.  If  we  require  I(p)/l(2p)  to  be  at 
least  1.66,  then  for  the  knacksack  problem  with  n  =  50.  k  is  between  4  and  B 
whereas  with  n  =100  it  is  between  8  and  18  (based  on  our  experimental  results). 
For  the  traveling  salesperson  problem  with  20  vertices  k  is  between  8  and  16.  If 
p  is  larger  than  k.  then  more  effective  use  of  the  processors  is  made  when  they 
are  divided  into  k  groups  each  of  size  approximately  p/k.  Each  group  of  proces¬ 
sors  is  used  to  expand  a  single  E-node  in  parallel.  If  s  is  the  speedup  obtained  by 
expanding  an  E-node  using  q  processors,  then  allocating  q  processors  to  each  E- 
node  and  expanding  only  p/q  E-nodes  in  parallel  is  preferable  to  expanding  p  E- 
nodes  in  parallel  provided  that  sl(l)/l(p/q)  >  I(l)/I(p). 


8  16  3  2  64  128  256  512 

p,  number  of  processors 


lOi  Traveling  Salesperson 


-22- 

Referaocea 

1.  N.  Agin,  "Optimum  seeking  with  branch-and  bound,"  Manage.  Sri.,  Vol.  13. 
pp.  B176-B1B5. 

2.  B.  Desai,  "The  BPU,  a  staged  parallel  processing  system  to  solve  the  zero- 
one  problem,"  Proceedings  of  ICS  ‘76,  1976,  pp.  802-617. 

3.  B.  Desai,  "A  parallel  microprocessing  system,"  Proceedings  of  the  1979 
International  Conference  on  Parallel  Processing,  1979. 

4.  0.  El-Dessouki  and  W.  Huen,  "Distributed  enumeration  on  network  comput¬ 
ers,"  IEEE  Transactions  on  Computers,  C-29,  1960,  pp.  818-825. 

5.  J.  Harris  and  D.  Smith,  "Hierarchical  multiprocessor  organizations,” 
Proceedings  of  the  4th  Annual  Symposium  on  Computer  Architecture,  1977, 
pp.  41-48. 

6.  M.  Held  and  R  Karp,  "The  traveling  salesman  problem  and  minimum  span¬ 
ning  trees:  part  11,"  Math  Prog.,  1,  pp.  6-25,  1971. 

7.  E.  Horowitz  and  S.  Sahni,  Fundamentals  of  Computer  Algorithms,  Computer 
Science  Press,  Inc.,  1978. 

8.  E.  Ignall  and  LSchrage,  "Application  of  the  branch-and-bound  technique  to 
some  flow-shop  scheduling  problems,"  Oper.  Res.,  13,  pp.  400-418,  1965. 

9.  W.  Kohler  and  K.  Steiglitz,  "Enumerative  and  iterative  computational 
approaches,"  in  E.  Coffman  (ed.)  Computer  and  Job-Shop  Scheduling 
Theory,  John  Wiley  8c  Sons,  Inc.,  New  York,  1976,  pp.  229-2B7. 

10.  E.  Lawer  and  D.  Wood,  "Branch-and  bound  methods:  a  survey,"  Oper.  Res., 
14.  pp.  699-719,  1966. 

11.  L.  Mitten,  "Branch-and-bound  methods:  general  formulation  and  proper¬ 
ties."  Oper.  Res.,  18,  pp.  24-34,  1970. 

12.  N.  Nilsson,  problem  Solving  Methods  in  Artificial  Intelligence,  McGraw-Hill, 
New  York,  1971. 

13.  B.  Wah  and  Y.  Ma,  "NAN1P  -  a  parallel  computer  system  tor  implementing 
branch-and-bound  algorithm,"  Proceedings  of  The  8th  Annual  Symposium 
on  Computer  Architecture,  1982,  pp.  239-262. 


:  *  -  -0jg.  <r  . 


1 


